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Abstract 

In this paper, we study the rate distortion function of the i.i.d sequence of multiplications of a Bernoulli p random 
variable and a gaussian random variable ^ N(0, 1). We use a new technique in the derivation of the lower bound in 
which we establish the duality between channel coding and lossy source coding in the strong sense. We improve the 
lower bound on the rate distortion function over the best known lower bound by plogj ^ if distortion D is small. 
This has some interesting implications on sparse signals where p is small since the known gap between the lower 
and upper bound is H{p). This improvement in the lower bound shows that the lower and upper bounds are almost 

plog2 — 

identical for sparse signals with small distortion because lirn H^-p)' ~ ^' 

I. Bernoulli-Gaussian model and some obvious bounds on its rate distortion functions 
Notations: in this paper we use x, y, u for random variables and x, y, u for the realization of the random 

X 

variables or constants. We denote by Pr(A) the probability of event A under measure x. We use bit and logj in 
this paper. 

Consider a sequence of signals xi,X2, ....x„, where x^'s are zero most of the time. When Xi is non-zero, it is 
an arbitrary real number. In the signal processing literature, the signals x" is called sparse if most of them are 
zero. In their seminal work on compressive sensing [3] and [6], Candes, Tao and Donoho show that, to exactly 
reconstruct the sparse signals x", only a fraction of n measurements are needed. Furthermore, the reconstruction can 
be done by a linear programming based efficient algorithm. In the compressed sensing literature, the non-zero part 
of the sparse signals are arbitrary real numbers without any statistical distribution assigned to them. Furthermore 
the compressed sensing system tries to recover the signals x" losslessly without distortion of the reconstructed 
signals. These assumptions are not completely valid if the source statistics are known to the coding system, more 
importantly, if the goal of the sensing system is only to recover the data within a certain distortion. In the recent 
work by Fletcher etc. [8], [7], [9], the . What is lacking in the previous study of this problem is a systematic study 
of the information theoretic bounds on the rate distortion functions of the sources. In this paper, we give both lower 
and upper bounds on the rate distortion functions. 

A. Bernoulli-Gaussian random variable 'E.(p,a'^) 

The information theoretic model of the "sparse gaussian" signals is captured in the following what we call a 
BemoulU-Gaussian random variable. 

Definition 1: A random variable x is Bernoulli-Gaussian, denoted by S(p, cr^), if x = b x s, where s is a 
Gaussian random variable with mean and variance a^, s ~ N{0, a^), and i) is a Bernoulli p random variable, 
Pr(6 = 0) = 1 - p and Pr(i) = 1) = p, p e [0, 1]. 

This random variable is a mixture of a continuous random variable and a discrete random variable. This adds to 
the difficulties to study the rate distortion functions of this random variable. The main result of this paper is a lower 
bound and an upper bound on the rate distortion functions of a sequence of independent random variables with 
distribution a^). It will be clear soon in Proposition [T] that we only need to study the rate distortion functions 
for s ~ iV(0, 1), i.e. the rate distortion functions for 1). First, we review the definition of rate distortion 
functions in both the average distortion and strong distortion sense. 

B. Review of the rate distortion theory 

In the standard setup of rate distortion theory, the encoder maps n i.i.d. random variables x" e A"", x ^ p^, into 
nR bits and then the decoder reconstruct the original signal within a certain distortion. The encoder and decoder 
are denoted by /„ and gn respectively: 

/„ : A-" ^ {0, and g„ : {0, 1}"^ ^ i", 
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and the distortion is defined as d{x",x") = - J2 d{xi,Xi). 

" 1=1 

Definition 2: Rate distortion function ([4], pg. 341): the rate distortion function R{D) is the infinimum of rates 
R, such that {R, D) is in the rate distortion region of the source for a give distortion D. Where the rate distortion 
region is the closure of achievable rate distortion pairs {R, D) defined as follows. {R, D) is said to be achievable 
in the expected distortion sense if there exists a sequence of (2"^, n) rate codes (/mgn), such that 

lim £;(d(x",5„(/„(x")))) (1) 

n — >oo 

The strong sense of rate distortion function is defined similarly with the following criteria for the codes: for all 

lim Pr(d(x",g„(/„(x")) >D + 6)^0 (2) 

n — *oo 

n 

where, in this paper, the distortion function d{x'^,x'^) = ^ — ii)'^- 

2 = 1 

It turns out that the rate distortions function for both the average distortion and the strong distortion are the same 
for discrete random variables Chapter 13.6 [4]. We can generalize this result easily to continuous random variables 
whose variance is finite and the probability density function satisfies the usual regularity conditions. The proof 
can be carried out by quantizing the probability density function and then by using the proof for discrete random 
variables in [4]. A somewhat detailed sketch of how this works is in Appendix lAl 

A good lossy coding system in the strong sense is not necessarily good in the expected distortion sense. 
Considering the following example, a good lossy coder can miss the distortion constraint for a subset T„ C TZ" with 

X 

asymptotically measure, lim Pr(T„) = 0. However the good lossy coder can intentionally make the distortion 

n — >oo 

on T„ no smaller than — , hence the expected distortion is at least 2D. 

Pr(T„) 

However it is easy to see that given a good lossy coding system in the strong sense, we can easily make it also 
good in the expected sense if the mean and variance of x are finite. We sketch the proof in Appendix iBl So from 
now on, when we say a lossy coding system is good in the strong sense, that implies that the system is also good 
in the expected distortion sense. 

The following lemma characterizes the rate distortion function R{D). 

Lemma 1: Rate distortion theorem [10]: 

R{D)^ min /(x;x). (3) 

Pk\x-Y^ Px(x)Pi\x(x\x)d(x,x)<D 



Corollary 1: Rate distortion theorem for Gaussian random variables [2]: for random variable x ^ N{0, u^), the 
rate distortion function is: 

RiD, 7V(0, a^)) = { ^ <DJ^J'> (4) 

It is also shown that with the same variance and squared distortion measure, Gaussian random variables requires 
the most bits to be described. Both lower and upper bounds are given in Exercise 8 on Pg. 370 [4]. The proof can 
be found in [2]. 

Corollary 2: Rate distortion bounds for continuous random variables under square distortion measure (Exercise 
8 on pg. 372 [4]): the rate distortion function R{D) can be bounded as: 

h{x) - ^ log(27re) < RiD) < max{i log ^, 0} (5) 
The lower bound in Corollary |2] is known as the Shannon lower bound in the literature [4]. 
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C. Rate distortion function for ^(p, cr^) 

The main goal of this paper is to derive an upper and a lower bound on the rate distortion function R{D) of 
the Bernoulli-Gaussian random variable S(p, cr^). We denote this quantity by R{D, S(p, u^)). We summarize some 
obvious properties of R{D, S(p, cr^)) in the following four propositions. The proof is in Appendix Icl 

First we explain why we only need to study R{D,E{p,l)). We write R{D,E{p,l)) as R{D,p) in the rest of 
the paper and investigate R{D,p). 

Proposition 1: cr^)) = R{-^,E{p,l)) 

From this point on, we only investigate R{D,'E.{p, 1)), simply written as R{D,p). Now we give three obvious 
bounds on the rate distortion function R{D,p). 

Proposition 2: Upper bound 1 on R{D,p): 

R{D,p) < H{p)+pR{-,N (0,1)) = H{p) +pR{D,N{0,p)) (6) 
P 

where R{D, N{0, 1)) is the Gaussian rate distortion function for iV(0, 1), defined in Corollary [1] 
Proposition 3: Upper bound 2 on R{D,p): 

RiD,p)<R{D,N{0,p)) (7) 
Proposition 4: A lower bound on R{D,p): 

R[D,p) > pR{-, iV(0, 1)) = pR{D, N{0,p)) (8) 
P 

We give a conceptually clear explanation of these three bounds. In Proposition |2] we construct a very simple 
coding system that first losslessly describe the locations of the non-zero elements of x" ^ 1), then lossily 
describe the value of these non-zero elements using a Gaussian lossy coder In Proposition [3] we prove it by using 
the well known fact that for continuous random variables, with the same variance and distortion measure, Gaussian 
sequences require the highest rate. The difficulty is that S(p, 1) is not a continuous random variable. We approximate 
it by a sequence of continuous random variables whose rate distortion functions converge to that of 1). In 
the proof of HI we reduce a Bernoulli-Gaussian sequence to a Gaussian sequence by letting the decoder know the 
non-zero locations for free and derive a lower bound of R{D,p) from the Gaussian rate distortion function. 

The more rigorous proofs of these bounds are in Appendix |C] It is non trivial to bound the rate distortion 
function of one random variable x by the rate distortion function of another random variable y. To show that 
R{D,x) < R{D,y), the technique we use in the proofs for the above four propositions is to construct a good lossy 
coding system for x from a good lossy coding system for y under the same rate-distortion constraint R and D. 

Among the three bounds described in Proposition |2] [3] and H] we find the lower bound the most unsatisfactory. 
Shannon lower bound [4] does not apply to the Bernoulli-Gaussian random E(p,l) variables because the differential 
entropy of S(p, 1) is negative infinity. This paper is focused on deriving a more information-theoretically interesting 
lower bound on R{D,p). In the next several sections, we investigate the lower bound problem. As a simple corollary 
of this new lower bound, we give a close form lower bound on the rate distortion function in IVIII that improves 
the previous known result by plog2 ^ in the high resolution regime (-^ ^ 1). 

II. An IMPROVED LOWER BOUND ON 

First, we reiterate the definition of a strong lossy source coding system for a Bernoulli-Gaussian sequence 
x" ~ S(0, 1) where x = fa x s and fa is a Bernoulli-p random variable while s ^ N{Q, 1) is a Gaussian random 
variable. A {R,D) encoder-decoder sequence /„,<?„ does the following, 

/„ : 7^" ^ {0, 1}"^, /„(x") = a"^ and g„ : {0, 1}"^ -> 7^", = 

from the definition of the rate distortion function in strong sense defined in dU, we have for all Si > 0: 

Pr(d(x",x")>i? + <5i)=Pr(d(x",5„(/„(x")))>i? + Ji) = e„(,5i)and lim e„(<5i) = 0. (9) 



3 



Recall that we can have a good lossy coder in both the strong sense and the expected distortion sense 
according to the discussions in Appendix |Bj So we assume the good coding system here /„ , 5„ is good in 
both senses. 



So let (d(x", ic")) = (rf(x", g„(/„(x")))) = + then lim = 0. 



(10) 



Notice that x" = 6" x s", where the muhipUcation x here is done entry by entry, so that if bi — 0, the value 
of Si does not have any impact on x". The output of the encoder /„ is a random variable that is a function of the 
sequence x", we write the output as a"^ = /„(x"). our investigation of the rate distortion function relies on the 
properties of the encoder output a"^. 



Encoder 





Decoder Qn 









X 



Fig. 1. A lossy source coding system for Bernoulli-Gaussian sequence x" 



i>" X s" 



In Proposition |4] the lower bound is derived by letting a genie tell the decoder the non-zero positions of the 
Bernoulli-Gaussian sequence, i.e. the 6" part of x" = 6" x s", and the rate is only counted for the lossy source 
coding of the non-zero Gaussian subsequence s^^^ \ where 1(6") is the number of I's in sequence jb" and s,; — si. 
if bi. = 1, i = 1, 2, 1(6"). To tighten the lower bound in Proposition |4] we need to drop the genie who let the 
decoder know the entirety of 6". In the following several sections, we attempt to tighten up the lower bound by 
investigating the information about 6" that has to be transmitted to the decoder. 

First we summarize our main result in the following theorem. 

Theorem 1: Main theorem: a new lower bound on the rate distortion function R{D,p) for Bernoulli-Gaussian 
random variable S(p, 1) under distortion constraint D. 

R{D,p)>pR{D,N{0,p)) + R 
where i? = max{ min h(L,U,r)} (11) 

L>0 U>L,re[0,l-p]-Ti{L,U,r)<D 

^ ' if i)XPr(|s|>;7)+r < P 

s ^ N{0, 1) is a Gaussian random variable. 

Proof: The theorem is a corollary of the Lemma |2] [3] IH and |5j 

RiD^p) > + (12) 

n 

> pR{D-{l-p)E[x^\b^O],N{0,p)) + ^ ■ '- (13) 

n 

> pR{D^{l~p)E[x^\b^O],N{0,p))+R (14) 

> pR{D,N{0,p)) + R (15) 

( fT2] l is proved in Lemma |2] (T3[ is proved in Lemma [3] (fl4] l is proved in Lemma |4] and |5] R is defined in ( fTTT l. 
(flST l follows that rate distortion function for Gaussian random variables R{D, N{0,p)) is monotonically decreasing 
with D. ■ 

There are four parts in our investigation. First in Section Hill we lower bound the number of bits nR by the 
sum of two mutual information terms. The first one is the conditional mutual information between the output 
of the encoder a"^ and the Gaussian sequence s" given the Bernoulli sequence Jb": /(a"'";s"|6"). The second 
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is the mutual information between the output of the encoder a"^ and the BernoulH sequence ii": /(a"^; 
Then in Section |IV] we lower bound I{a"^; s"|/5") by using a simple argument similar to that in Proposition]?] In 
Section]V] we lower bound /(a"^; jb") by the capacity of the lossy coding channel, while the capacity of the channel 
is unspecified. In Section ]VI] we give a lower bound of the channel capacity by using a random coding argument. 
Finally in Theorem JT] we combine these bounds together to give a lower bound on the rate distortion function 
R{D,p) for the Bernoulli-Gaussian random sequence S(p, 1) under distortion constraint D. The investigation spans 
the next four sections in this paper. 

III. First step: lower bounding nR by the sum of two mutual information 

/(a"-«;6") + /(a"-«;s"|i!5") 

First we have the following simple lemma that tells us that the rate is lower bounded by the sum of two mutual 
information terms 7(a"^; ib") + /(a"^;s") where a"^ is the output of the lossy encoder and 6" and s" are the 
BernoulU sequence and the Gaussian sequence that generate the Bernoulli-Gaussian x" 1). 

Lemma 2: For a lossy coding system shown in Figure JT] the rate of the lossy source coding system can be lower 
bounded as follows: 

ni? > /(a"^; 6") + /(a"-^; s"|6") 

Proof: The output of the encoder a"^ e {0, 1}"^, so the entropy of the random variable is upper bounded by 

iJ(a"'^) < nR (16) 

Notice that a"^ is a a function of x", i.e. a function of s" and jb", so 

iJ(a"-^) = H{a"") - H{a""'\s'\ b") (17) 

Combining ( [T6l l and (VH . and notice that i)"±s", we have: 

nR > i7(a"-^) - H{a"^'\s", fa") 
= /(a"-";s",jb") 

= /(a"-";fa") +/(3"-";s"|fa") (18) 
where (ITsT l is true by the chain rule for mutual information [4]. □ 
IV. Lower bounding I{a'^^; s"|i)"), Proposition]4]revisited 

In this section we lower bound the conditional mutual information term /(a"^;s"|fa") in the lower bound of 
nR ( ITSl l. From Proposition ]4] we know that letting a genie tell the non-zero locations of x" to the decoder, the 
coding system still needs at least npR{D, N{0,p)) bits to describe the values of the non-zero entries of x". In the 
proof of Proposition]!] like the proofs for other propositions in Section iLAl we use the lossy source coding system 
for the Bernoulli-Gaussian sequences to construct a lossy source coding system for a random sequence with known 
rate distortion functions. 

The proof here, however is trickier in the sense that we are not bounding the rate distortion function R{D,p), 
instead we only bound the conditional mutual information I{a'^^\ s"|fa") which is part of the rate. Hence we cannot 
construct a lossy coder for sequence with known rate distortion using the lossy coder for the Bernoulli-Gaussian 
sequence. Instead, we use the classical technique in [4]. 

Lemma 3: Lower bound on I{a'^^; s"|fa") 

/(a"-^; s"|fa") > npR{D - (1 - p)E[x^\b = 0],iV(0,p)). (19) 

where 



1 " 

E[x'\b = 0] = -J2E[xi\b.^0] 



n 

i=l 



i:bi=0 



[Eixflb'^^b-])] (20) 
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Proof: The proof is similar to the lower bound proof for Gaussian rate distortion function on Page 345 [4]. 
First, notice that the estimate ic" = g„(a"-'^) is a function of a". And the a"^ = = x s"). Hence we 

have the following Markov Chain: 

fa" X s" ^ a"-" x" (21) 
From the data processing theorem [4], we know that /(a"^; s"|/3") > /(x"; s"|fa"). For a binary sequence 5" G 

n 

{0, 1}", let 1(6") — ^ 6i be the number of I's in 6". bi E {0, 1}, so if 6^ = then x" and Si are independent 

because in that case Xj = 6^ x = and s" is i.i.d and x" is a deterministic function of x". Write ii, ...jii^^,.) 
the non-zero positions of 6", and let X(6") = {ii, then 

/(x";s"|fa" = 6") = /(x";s,,,...,s,,,,„,|ib" = 6") = /(x"; s,„...,,,^„., ). (22) 

Define the ei-strong typical set S"^ for binary sequences: 





Bl\ ^ {fe" 


1(6") 

£{0,1}":!^ 

Tl 


From the AEP [4], let Pr(6' 




lim = 


Now we have: 






/(a"-«;s"|fa") > 


/(x";s"|/5") 






f)''G{0,l}" 


= 6")/(x";s"|/5" 




E = 


6")/(x";s"|fa" = 




E 


6")/(x";s,,,...,s, 




E p^(^" = 


&") (i?(s.,,...,.,,,„, 


> 


E p^(^" = 


/l(6") 

6")f E^(^».) 




E p^(^" = 


/l(6") 

6")f E^(^u 


> 


E = 


/l(6") 

6")f E^(^u 



l(b") 

E- 

i=i 
i(b") 

E- 

i=i 
i(b") 

E- 



(23) 

(24) 
(25) 
(26) 



(27) 



(28) 



(|24] | follows the definition of conditional mutual information, (l25T l is true because mutual information is non-negative 
and (|26] | follows (|22] |. (|27] | is true because s" is i.i.d and independent of b''\ The rest are obvious. Si ^ N{0, 1), 
so H{si) = i log(27re). According Theorem 9.6.5 in [4], Gaussian random variables maximize the entropy over all 
distributions with thes ame covariance, so: 

i/(s,^. ~x,Jfa" - 6") < H{N{0,E[{s,^ -^^,)^\b" = 6"]) = Uog{2TT eE[{s,^ ~ x,^f\b" = 6"]). 
Now (l28Tl becomes: 
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1(6-) 



= ^ Pr(/," = 6") - E 9 l°g(^[(^^. - = &"]) 



J = l 



1(6") 



= E Ep^(^" = ^")M^[K.-x.,fi/^" = n)) 

6"eB^\ j=i 
( 

Pr(i)" = 6") 



1(6") 

E E — " ' — i°g(^[K- - ^u^i^- = ^-d 

6-es^"^ j=i ^ pr(/,« 

V 6"6B,"^ j=l J 

1(6") 

E Ep'^(^" = ^^ 



> — loe; 
2 ^ 



Pr(6" = 6") 



1(6") 

E E- 

fc"es" j=i 



1(6") 



-{E[{s,^^k,^Y\b- = b-]) 



V 

1(6") 

^ ^Pr(fa" = 5") 

vfc"e-B" j=i 



follows the fact that — log( ) is convex |J. We bound the two terms as follows, first: 

l(&")Pr(b" = 6") 



1(6") 

^ ^Pr(fa" = 6") 

6"eB" 3 = 1 



> ^ n(p-ei)Pr(fa" ^ 6") 

> - ei)(l - u„) 
Before bounding the other term, we have the following observation: 



(29) 



(30) 
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1(6") 

^ ^ Pr(6" = bnms^, -xg^lfa" = &"]) 

^b"GB^"^ j=l 
1(6") 

^ ^ Pr(6" = b-mix.^ - x,j2|b" = 6"]) 

l(fc") 

6" J = l 

< n{D + ^rO - I E E = b^)iE[ix^ - xO'l^" = 



(31) 



< n(Z? + - ^ Pr(fa" = 6")(i?[xf = 6"]) 

\ fc" i^I(6") 

= 7i(D + (r„) -?i£;[x^|/3 = 0] (32) 

where = {ii, and ^„ ^ as n goes to infinity, ( |3TI ) follows the fact that /„, .g„ is good in the 

expected distortion sense as well ( fTOb . So the first term in ( |29] l can be lower bounded as follows, combining (|30] | 
and (l32l: 

/ \ 



- log 

2 ^ 



Pr(i)" = 6") 



1(6") 

E E- 

6"eB."^ i=i ^ j2 Pr(/3" = 6") 

V 6"6B,"^ j = l 



1(6") 



>4,„,((£_S!M±-)| ,33, 

2 V (p-ei)(l-w„) 



first notice that we are lower bounding a conditional mutual information /(a"^;s"|6") which is non-negative, 
so we assume the first term being positive or else we lower bound the conditional mutual information by 0, so 
substituting dSOl l and ( l33T l into ( |29] l, we have: 



/(a"-«;s"i6") > n(p-ei)(l-u„)max{0,log 



(p- ei)(l - Vn) 



} 



(34) 



Notice that ei is an arbitrary positive real number, and both Vn and ^„ goes to zero as n goes to infinity, so we 
just showed that 



J(3"";s"|/3") > npxmax{0,log 



} = npR{D-{l-p)E[x^\b = 0],N{0,p)) 



^D-{l~p)E[x'^\b = 0]^ 
The lemma is proved. 

As a trivial corollary of Lemma |2] and Lemma [3] we have: 

nR > I{a''^;b"-)+I{a''"';s''\b'') > npR{D - {1 ~ p)E[x^\b = %N{Q,p)) > npR{D, N{Q,p)) 
This also proves Proposition |4] 



□ 
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V. Lower bounding I{a'^^, jb") by the randomized channel capacity of a lossy compressor 

In this section we give a lower bound on the mutual information /(a"^; fa") from a channel capacity perspective. 
This is partly inspired by the seminal work in [1]. First we have another look at the whole lossy coding system 
in Figure [T] we single out the binary randomness b" and make the rest of the system a "lossy coding channel" as 
shown in Figure |2] The channel input is a binary sequence 6" G {0, 1}", and the channel output is a"^ e {0, 1}"^. 
What the channel does is to first multiply 6" by a Gaussian random sequence s" and then send it to a good lossy 
encoder /„. The output is the output of the lossy coding encoder /„. 

Notice that this is not a standard communication channel. It is in some sense a arbitrarily varying channel. The 
constraint on the channel is such that the lossy coder pair /„, g„ is good in both the strong and expected distortion 
sense. The goal in this section is to lower bound the mutual information /(a"^, fa") by the number bits 
(channel capacity) that can be reUably communicated across the channel in average over a randomized 
codebook. 

More interestingly, the input sequence fa" obeys the statistics of a Bernoulli process with non-zero probability p. 
So it will be soon obvious that we need to investigate the channel capacity for the randomized codebooks where 
each code word is chosen according to its probability under i.i.d Bernoulli-p. 



Encoder /, 



Decoder gn 



X 



Lossy Coding Channel 



Fig. 2. A "lossy coding" channel derived from the lossy coding system for Bernoulli-Gaussian sequence x" = i>" X s", 



As shown in Figure [3] we have a channel coding problem. A message m is a random variable uniformly distributed 
on {1,2, ...,2"^}. The constraint on the channel encoder F„ is that the code word 6" is chosen for message m 
with probability 

where 1(5") is the number of I's in sequence 6", this will be explained in details in Definition [3] The constraint 
on the lossy coding channel is such that the estimate of the Bernoulli Gaussian random sequence x" = fa" x s", 
through the lossy coding system fn,gn- is within a distortion D + Si of the true sequence x" with probability 1 
for all Si > asymptotically. Before giving the lemma on the lower bound of the mutual information I{a"^; fa"), 



we give the following definition of randomized channel capacity for the lossy source channel. 

Definition 3: Randomized channel capacity for the lossy source channel is written as let Bn = {0, 1}", let 
C{n) be the codebook set of rate R: C{n) — Bf^ is the set product of 2"^ many S„'s: Bn ^ Bn x ■■■ x Bn, a 
codebook C £ C{n), C — (ci, C2, ...c^^ft) is such that the codeword for message m, m = 1, 2, ...2"^, is the i'th 
entry of C: Cm- From the definition Cm G Bn for all n. We let Cp be a random variable distributed on C{n), such 
that a codebook C — (ci, C2, ...C2„n) € C{n) is chosen as the codebook, i.e. Cp — C with the following probability: 

Pr(Cp = C) = n p'^'"'H^ - pT''^'"'^ (35) 

m— 1 

' Note: in this section we use R to denote the channel capacity of the lossy coding channel. This is not the rate of the lossy coding system 

R. 
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m 



Channel 
Encoder F„ 



i 



Encoder /„ 



Lossy Coding Channel 



Decoder 



Channel 




Decoder Gn 





X 



m 



Fig. 3. A channel coding system for the "lossy coding" channel 



the average error probability of the randomized coding with uniform distributed Cp is defined as: 

2" 

. — , / I 

6p,n (-^) 



^ Pr(Q ^C)i^J2 ^ M^""{cm X s"))) 

^ Pr(Q = C) (Pr(m ^ = C)) 



(36) 



where the error probability is over all codebooks C{n) — Bg with distribution defined in (|35] | and all messages 
m e {1, 2, 2"^}, i.e. the random variable m is uniformly distributed in ( |36] |. Notice that in Figure[3] a codebook 
C is chosen and known to both the encoder and the decoder. The output from the channel encoder is F„{m) = Cm, 
the output from the lossy encoder is a random sequence fn{cm x s") — a"^{cra x s"), and the estimate of m is 
mia"'^{c^xs")) = Gn{a"'^ic^xs"))). 

The randomized channel capacity for the lossy coding system fm9n is Rp, if for all R < Rp, there exists a 
channel decoder Gn, such that the average error goes to zero as n goes to infinity: 



lim Cp „(i?) — 0, equivalently: Rp 



sup {^}- 

lim ep_„(fl)=0 



The following lemma summarizes the main result in this section. 

Lemma 4: Lower bounding the mutual information I{a"^, 6") by the randomized capacity: for any e > the 
mutual information is lower bounded by the minimum randomized lossy coding channel capacity: 



1 



liminf-/(a"^;jb") > Rp 

n^oo n 



sup {R} 

lim ep,„(fl)=0 



(37) 



Proof: : to show [371 from the definition of Rp, we know that it is enough to show that for all R, such that 
lim ep,„(i?) = 0: 

n— *cx) 

liminf-/(a"^;£)") > R. 

First we take a new perspective of the Bernoulli sequence 6". Instead of letting 6" be i.i.d generated from 
the Bernoulli p random process, we first generate two auxiliary random variables Cp and m and then the b" is a 
function of the two auxiliary random variables in a way such that fa" is an i.i.d Bernoulli p sequence. 

We first generate a codebook random variable Cp according to the distribution described in dSST l. where the code 
book Cp = C = {ci, c^nii) with the following probability: 
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Then we pick the message random variable m according that is uniform on {1, 2, 2"^}. Finally we let the binary 
sequence fa" be a function of Cp and m, such that for Cp = C = {ci, c^nk) and m = m, 6" = c,„. It is easy to 
see that 6" chosen this way have the following distribution: 

Pr(/3" = 6") =/(^"'(l 

So we have the following Markov Chain: 

(Cp, m) -> lb" ^ a"^ (38) 

So from the data processing lemma and the chain rule for mutual information, we know that: 

/(a"-«; 6") > /(a"-«; Cp, m) 

= /(a"^;m|Cp)+/(a"«;Cp) 

> /(a"«;m|Cp) (39) 

where the last inequahty follows that mutual information is always non-negative. Now the overall error probabiUty 
is, as defined in ( [36] i: 

ep,„(i?)= Pr(Cp = C)(Pr(m^/77(a"«)|Cp = C)) (40) 

where Pr(Cp = C)(Pr(/D ^ /T)(a"^)|Cp — C)) is the decoding error when the code book C is chosen. Hence this 
is a standard communication problem that we can use the technique detailed in Chapter 8.9 [4] to lower bound the 
mutual information /(a"^; fa") by the rate R that a reliable communication is possible. Notice that if the codebook 
C is chosen, we have the following Markov Chain: 

m ^ 6" ^ a"^ ^ m, (41) 

more specifically fa" is a deterministic function of m, m is a deterministic function of a"^. So we can apply Fano's 
inequality (Theorem 2.11.1 [4] for any fixed codebook C: 

H{m\a"", Cp^C)<l + Pr(/n ^ m{a''^)\Cp ^ C)nR (42) 

Now, from the standard information theoretic equalities: 

nR = H{m) 

= i?(m|Cp = C) 

= F(m|a"-", Cp = C)+/(a"-«;m|Cp = C) 

< 1 + Pr(m 7^ m(a"-^) I Cp = C)nR + I{a'''^; m\ Cp = C) 

Multiply both sides by Pr(Cp — C) and sum over all C G 6^"", we have: 

nR < 1+ Pi-(Cp = C)(Pr(m7^m(a"-")|Cp = C)n^ + /(a"-^;m|Cp = C) 

= 1 + nRx ep^n{R) + I{a"^; m\ Cp) (43) 
Finally, substitute ( [39] l into ( |43T l, we have: 

I{a""; fa") > /(a"-^; m|Cp) >nR~l-nRx ep.n{R) 
So, if the randomized lossy coding capacity is above R, i.e. lim ep,n{R) = 0, then 

n — *oo 

liminf-/(a"-";fa") > R 

n— >oo 77, 

□ 
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VI. Randomized Channel capacity of a lossy compressor, a lower bound 

In the previous section, we showed the relation between the mutual information /(a"^, fa") is lower bounded 
by the randomized lossy coding capacity if the input codewords look like an i.i.d Bernoulli p sequence. What was 
missing in the previous section is a lower bound on the randomized capacity. In this section we study the capacity, 
in particular the lower bound on the capacity. Notice that the encoder is using a randomized code book according 
to the distribution in ( |35] ). We only need to design the decoder d in Figure [3] If we could show that for some 
R, the average error probability ep,„(i?) goes to zero as n goes to infinity, then whatever the R is, it is a lower 
bound on the randomized lossy coding capacity Rp. We give a lower bound on Rp. As will be clear soon from our 
derivation of the lower bound, this bound is not tight. However, this is our first effort to derive a non-trivial lower 
bound to the rate distortion function R{D,p). 

Lemma 5: A lower bound on the randomized lossy coding capacity: 

i?T) > R — maxj min h(L, U, r)| 

~ L>0 U>L,relO,l-p]:TiiL,U,r)<D 



>P 



pxPr(|s|>;7)+r 



s in (|44]i is Gaussian iV(0, 1) and Ti(L, U,r) = rL^ + 2p (s - Lf—j=e~~ds 

Or equivalently, for all R < R, the decoding error defined in ( |36] l for the randomized coding scheme converges 
to zero as n goes to infinity: 

lim ep,n{R) = 

n — ycc 

Proof: we first describe the decoder Gn- The codebook C is chosen, i.e. Cp = C. As shown in Figure [3] if a 
message m is to be sent, where m G {1,2,..., 2"^} with equal probability, the binary output to the channel encoder 
Fn is Cm- After the modulation of the Gaussian sequence s" and the lossy source coding encoder /„, the channel 
decoder Gn receives a"^. The first step of G„ is to run the lossy source decoder gn and get the lossy estimate of 
x" = Cm X s", ic" — gn{a"'^). The second step of G„ is to estimate m from x". We pick the code word with the 
most entries' absolute value above the positive real number L: 

n 

m(o"-"(ci X s")) = m(x") = argmax^ l(|c,(fc)xfe)| > L) (45) 

k=i 

where Ci E {0, 1}" is the codeword for message i in the chosen codebook C and Ci{k) e {0, 1} is the fc-th entry 
of the codeword q. Now we analyze the average error probability of the above coding system over all codebooks 
according the the codebook distribution in ( [35l l and the over all Gaussian sequence s". The average error probability 
is hence as shown in (|36] |: 

2 

2nR 



Cp.niR) = E = ^) ( i E ^ ™(a"''(c™ X S"))) 



m— 1 



E P^C^P ^) (^'■(1 ^ ?Ti(a"^(ci X s"))) 



(46) 



= Pr (1 ^ m(a"^(ci x s"))) (47) 

where (l46T l follows the symmetry of the system. 

We decompose ( |47] i into four parts. We sketch the partitions then give a detailed analysis. 
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1 ) The atypical behavior of codeword Ci . The typicality is defined in the usual way [4] for finite discrete random 
sequences. The concentration theorem is well established in the literature. 

2) The atypical behavior of s^^'^^^ while Ci is typical, where s^'"^!) is the non-zero subsequence of x" = Ci x s" 
where Si — s,;j , Sij^i) = ^ii(ci)' where ii, Zijcj) are the non-zero locations of Ci. The typicality for a 
Gaussian A^(0, 1) sequence is defined in Appendix iP] We prove the concentration result in Lemma |6] 

3) The atypical behavior of the lossy source coding while both Ci and s^^ , Si^^^ ^ are typical, i.e. the distortion 
of the Bernoulli-Gaussian sequence d{ci x s", x") = d{x" , x" ) > D, the concentration of the typical behavior 
of the lossy source coding is established in ^ for good lossy coders. 

4) The probability that there exists a message m that has a higher score than message 1 according to the decoding 
rule in (05]) while everything else (the codeword for message 1, Ci, the subsequence s^^, Si-^^^^^^ and the 
distortion d{ci x s".x") are typical. We bound this error by a union bound argument. 

The first part is the atypicaUty of the codeword for message 1, Ci, the second part is the error probability for 
ci e S", where 

B," = {^" e {0,1}" : -P\ <e}- 

n 

Under the codebook probabihty Cp, all q's are binary sequences of length-n with distribution such that for all 

&" e {0, 1}": 

Pr(Q = 6") i = l,2,...2"-«. (48) 

so we obviously have [4]: 

Cp 

lim Pr(ci ^ Bl') = (49) 



The second part is the atypicality of the Gaussian subsequence s,;j , s^^^^^^, where ii, «i(ci) are the non-zero 
locations of Ci, while Ci is typical, Ci e S"). The typical Gaussian A^(0, 1) set is defined as follows, first we have 
two definitions: for a real sequence s" and s.t. —oo < S < T < oo, the l-th moment of entries in s" within interval 
[S, T] is denoted by 



nUS,T) 



Then the e-typical set for Gaussian N{0, 1) is defined as: 



f n J 




! s : max < 


sup 




S,T 



nUS,T) 



e ^ ds 



We prove the concentration result in Lemma |6] in Appendix ID] lim Pr(s" ^ Se{n)) = 0. Ci and s" are 

n — >co 

independent, and if ci G B", then l(ci) > p{n — e), so if n goes to infinity, l(ci) goes to infinity too, so 



lim Pr (ci e i ^e(l(ci))) < lim Pr (I^^^i) ^ 5,(l(ci))|ci G S^') 

n — >-oo 71 — >-oo 

= 

where the first inequality follows that conditional probability is bigger than joint probability. 



(50) 



The third part is the atypical behavior of the lossy coding system. Following the definition of a good lossy 
source coder in the strong sense in ^ and that x" = ci x s", we have, for all (5i > 



C„,s 



lim Pr (d(x",x") >D + di)= lim Pr {d{ci x s",x") > D + 6i) = 
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This implies that: 



lim 



Cp.s 

Pr 



ci e e S',(i(ci)),d(ci X s",x") > ZJ + fJi 



(51) 



The fourth part is when the code word ci, the Gaussian subsequence s^^'^^^ and the distortion d{ci x s",x") 
are all typical, the decoding error for the channel decoder following the decoding rule in ( l45l ). 

The output of the lossy source coding decoder is x" = ffn(a"^(ci x s")), from the decoding rule in i45[ . the 
estimate of the message rh{a"^{ci x s"))) is not equal to the true message 1, if and only if there exists a message 
222- 1, such that 

n n 

J2 H\c,n_{k)xk)\ >L)>J2 li\ciik)xk)\ > L) (52) 

k=l k=l 

Notice that the codebooks are symmetric to the messages, i.e. over all the codebooks, the probability that the 
estimation of the message m = i is equal to the probability that rh = j for all i,j E {1,2, ...,2"^} and i ^ I, 
j 1. So we can union bound the decoding error probability of the event shown in (|52] | as follows: 



sC -sC/" " \ 

Pr(l^m(a"«(ci xs"))) < 2"^ Pr l(|c2(fc)£fe| > L) > ^ l(|ci(fc)xfe| > L) (53) 

\fc=l k=l J 

where the probability is calculated over all possible codebooks over the measure Cp and the Gaussian sequences 
s". First, for a codeword ci, and the lossy coding estimate of ci x s", i", denote by u and v the number of entries 
of the estimate Xk with absolute value above L where ci(fc) is 1 and respectively: 



= ^l{\c^{k)xk)\>L) 

n 

= Y.^{\xk)\>LmiAc^{k)^Q). (54) 



fe=i 



With u and v fixed( here we fix the codeword ci, the sequence s" and the estimate a;"), we union bound the 
probability of the following event that there exists a message m^l, such that ( l52b is true: 



Pr (1 ^ m(a"^(ci x s"))|ci = Ci,s" = s") < 2"«Pr ^ l(|c2(fc)xfc| > i) > l(|ci(fc)xfc| > L) 

k=l ) 

= 2"^Pr(f^l(|c2(A:)ife|>L)>?/) (55) 



= 2"«^ ( "I'' )p'(l-pr+"^' (56) 



< 2"-«xn max { ^(1-^)"+""} (57) 



5 — (u+u) mill D{— j[p) 

< 2"'^xn2 ^:-<i<^+. (58) 
n if " < 77 

~ nO-^^+^^^^^IIP^ if ^ > B ^ ^ 

(|55j follows the definition of u. ( |56] l follows that C2 G {0, 1}" is an i.i.d. Bernoulli p sequence. ( |57] l is because 
u + w < n. (l58l l and (|59] l follows basic information theoretic inequalities [5]. From Lemma [T] in Appendix [E] 
we know that the {u + v)D{-^^\\p) is monotonically increasing with u and monotonically decreasing with v. 

is also monotonically increasing with u and monotonically decreasing with i;, so the expression in ( |59l ) is 
monotonically decreasing with u and monotonically increasing with v. 
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( |59] ) is true for all codeword ci and sequence s^'^'^^^ typical or not. So it is also true for all those ci e B", 
gi(ci) g S'e(l(ci)) and d(ci x s",3:") < D + (5i in this case, we can give a feasible region for u and w, i.e. then 
give a bound on ( |59l ). We further investigate the distortion for the said typical sequences; 

n{D + Si) > 7id(ci X s",£") 

n 

= y^(ci(fc)sfc - xfc)^ 

^ (Cl(/c)sfe-ife)^+ X! 
k:ci(k) = l fe:ci(fc)=0 

^ (ci(/c)sfe-Xfe)2+ ^ xl+ xl 

k:ci{k) = l k:ci{k)=0,Xk>L k:ci(k)=0 ,Xk<L 

> (ci{k)sk - Xkf + vL^ (60) 

fe:ci(fe) = l 

where ( |60b follows the definition of v. Notice that by definition x^ — ci{k)sk, so Xk > implies that ci(fc) = 1, 
the first term of ( l60l l is: 

{xk-xkf > Y {xk-xkf' 

fc:ci(fe) = l l>i>l*fcl 
fe:|2;fc|>L>|ifc| 

We rewrite ( l60b as: 

ri(£' + '^i)> XI (|a;fc|-L)^+'yi^ (61) 

fe:|2;fc|>L>|£fc| 
ri 

From the definition of u: we know that u—'Y^ l(|ci(A:)a;fe)| > L) hence 

k=l 

n n n n 

YH\xk\ > L > \xk\) > Xl(kfc| >0)-Xl(0< Ixfel <L)-Xl(|ci(A:)ife)| >L) 

k=l fe=l fc=l fc=l 

n 

= ^Idxfel > L)-it 
fe=i 

^ n(|a;fe|>L)-M (62) 

Recall that si, ...sk^ci) are the none-zero entries of x", without out loss of generality, let |si|, .•■|sn(|a;j.|>L)-n| 
be the smallest n(|xfc| > L) — u many |a;fe|'s that are larger than L, without loss of generality let |si| > .... > 
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|Sri(|2:fc|>L)-nl ^ L. Then substituting ( |62l ) into ( |6T1 ) and denote by [7 = |si|, we have: 

n(|a:fc|>L)-M 

n{D + 5i) > {\Sj\-Lf + vL^ 



J2 {\Sj\~L)' + vL' (63) 

j:L<|sj|<i7 

^ i\rs,\'-2L\:s,\+L^)+vL^ 



j:L<\sj\<U 



^ 1 



> 2xl(ci) / {s- Ly^=e--ds- e{l + Ly \ +vL^ (64) 



L 



/27r 



u 



1 



> 2xn(p-e) / (s-L)^^=e~-rfs-e(l + L)M (65) 



L 



U 



/2n 

1 



> n I 2p / (s-L)2^=e-Vds + _L2 -neiv:i(p,L) (66) 
V27r ^^ / 



(l63T l follows the definition of s^^^^', ( l64l i is true because s^^^i^ e 5e(l(ci)) is e-typical Gaussian A^(0, 1). (l65T l is 
true because ci G -B". Finally in ( |66] l. Ki{p, L) is a finite function of p and i, we do not need U in the picture 
because we can replace U with oo when bounding the the residue. We rewrite (|66l l as: 

2p / (s-Ly^=e'-ds + -L^ <D + di + eKi(p,L) (67) 
Jl v27r ?^ 

Meanwhile, because U |si| > ... > |sn(|xfc|>L)-til ^ L are the smallest ri(|a;fe| > L) — u many jxfel's that are 
larger than L, s^('^i) is a e-typical Gaussian sequence, so 7i(|a;fc| > L) — u < l(ci)(Pr(i < |s| < t7) + e), hence: 

w > n{\xk\> L)^l{ci){Vv{L <\s\<U) + e) 

> n{p - e)(Pr(|s| > L) - e) - n{p + e)(Pr(L < |s| < C/) + e) 

= npPr(|s| > [7) -nei^2(p,i) (68) 
The above analysis are true for all Si and e, we let both be small, we have 

u > n(pPr(|s| > [/) - ea) (69) 

s.t.: 2p / {s- Ly^=e^'^ds + -L^ < D (70) 
V27r n 

where lim £2 = 0, this is true because for any U that satisfies ( |67] i, it either also satisfies the more stringent 

(5,e— >0 

constraint in ( |70l ) or the gap between f/ and the biggest U that satisfies ( iTOl l is small when di and e are small. 
Then follows the continuity of Pr(|s| > U) in U. 

Notice that i5% holds for all codeword ci and s", in particular it is true for the typical ones, ci S and 
s^'^'^i^ e S'e(l(ci)) and d{ci x s",^:") < 13 + also ( |59] l is monotonically decreasing with u, with ( |69l ) and let 
r = recall the definition of v in (l54l i. for ci G i?^, w < n — n{p — e) — n{l — p + e) or equivalently re [0, 1 — p], 
we rewrite (|59] l: 

'Pr(l ^77i(a"-"(ci X s"))|ci = ci e B,",s" = s" e 5,(l(ci)), d(ci x s",i") < -D + f^i) 
- ^ n2-("+")^(^IIP) ,if 

' Pr(|s|>C/)+r -'^ 
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with ( TTOI i being satisfied, where lim 63 = because the exponent in ( TtTI i is continuous in u, we know that 
lim €2 = 0, so lim £3 = as well. 
Notice that the coding system can pick arbitrary L, it picks the best possible L, we have, if 

R < R = max{ min h(L, U, r)} 

L>0 U>L,relO,l-p]:Ti{L,U,r)<D 



where Mi, C/.DH > + --'^(iSliS^tW ' f||f^ ^ " 

^ ' pxPr(|s|>;7)+r < P 

then 

lim 'Pr(l 7^m(a"-«(ci x s"))|ci = ci G s" = s" G 5e(l(ci)), d(ci x s",i;") < L> + (5i) = (72) 

n — >QC 

The above inequality is true for all those ci G S", s^^'^^^ G iS'e(l(ci)) and d{ci x s",x") < -D + , so 

lim '^(l 7^m(a"-"(ci xs")),Ci G s" G S',(l(ci)), d(ci xs",x") <L' + 5i) = (73) 

n — >oo 

Finally we can upper bound the overall error probability of the randomized coding scheme. The decoding error 
ep,ra (R) is defined in (l36b which is equivalent to ( |46] | because of the symmetry. We decompose the error event into 
4 atypical events as illustrated at the beginning of the proof. For any R < R, 

ep,n{R) = Pr (1 ^ m(a"^(ci X s"))) (74) 
< Pr(ci^Bn 

+ Pr(ciGi?r,?'^'^^^^e(l(ci))) 

+ Pr (ci G S^'Jif^i) G S,{l{ci)),d{ci X s",x") > D + Si'^ 

+ P^(l 7^m(a"^(ci X s")),Ci G s" G 5e(l(ci)), d(ci x s",x") < D + 61) (75) 

where (|74] | follows (l46T l. The asymptotic behaviors of the four terms in (|74] | are shown in (|49] l, (ISOl l. ( fSTT i and ( |73] | 
respectively, can be arbitrarily small, so we can finally claim that: for a good lossy source coding system in 
the strong sense with distortion constraint D, the randomized channel coding error converges to zero as n goes to 
infinity: 

lim epn{R) = 

n — >cxD 

This concludes the proof of Lemma |5] □ 

VII. Discussions and Numerical Result 

Now we have two upper bounds and two lower bounds on the rate distortion function R{D,p). We reiterate the 
bounds, 

RiD,p)<Hip)+pR{D,N{0,p)) (76) 
R{D,p)<R{D,N{Q,p)) ill) 
R{D,p)>pR{D,N{0,p)) (78) 
R{D,p)>pR{D,N{Q,p)) + Tnayi{ min h(L,U,r)}=pR(D,N(0,p)) + RAD,p(p9) 

L>a U>L,re[0,l-p]:Ti(L,U,r)<D 



Where M^, a, = ( "'"'I * ^' " ^^^^1^^ ■ ^ ^^jj^h ^ P „ 



pxPr{\s\>U) II X pxPr{\s\>U) 
3XPr(|s|>;7)+r Wi^) ' pxPr(|s| >C/)+v 

if PXP'-(I^I>^) < r> 

" ' " pxPr(|s|>(7)+r ^ ^ 

sis Gaussian 7V(0,1) and Ti(L,C/,r) =rL^ + 2|5 / {s - L)'^ ^=e^ — ds 
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where R{D, N{Q,p)) is the rate distortion function for zero mean variance p Gaussian random sequence with 
distortion consti-aint D, R{D, N{0,p)) = max{0, 5 logj -^}. CH, ^T} and are derived in Propositions |2] [3] 
and m respectively, ( |79] l is the main result in Theorem [T] 

A. Properties of the improvement Ri{D,p) 

The improvement of our new lower bound, the second term Ri{D,p) in ( |79] l. has a game theoretic interpretation. 
In a two player zero sum game, the first player (the coding system) chooses L, the second player (adversary) chooses 
U and r with string attached in dSOb . the payoff to player one is h{U, L, r). First we argue that the improvement 
of our lower bound, the second term Ri {p, D) in ( |79] l. is monotonically decreasing with D and if for some D, the 
improvement is zero. 

Corollary 3: Ri{D,p) is monotonically decreasing with D, i.e. for Di > D2, Ri{Di,p) < Ri{D2,p) 
Proof: Ri{D,p) is of the form of 

maxj min h(L,U,r)}, 

L>0 U>L,re[OA-p]:Ti{LM,r)<D 

SO for all L > 0, if the pair (U, r) is feasible for D2, it is also feasible for Di, hence the minimum of h(L, U, r) 
for Di is no bigger than that for D2- □ 

More importantly the improvement is within [0, H{p)] in light of the upper bound in ( f76] l. In the low distortion 
regime, i.e. -2- <c 1. We argue that the improvement Ri{D,p) is close to plog2 ^. 

Corollary 4: Asymptotic behavior of Ri{D,p) in the low distortion regime , for any p > 

\im R^{D,p) =p\og2 - 

£>— >0 p 

Proof: We only give a sketch of proof here. The coding system pick a positive L <C 1, but 3> D, say 
L = D^ '^ The distortion constraint on Ti{L, U, r) implies that D > rL^, hence 

So r goes to zero as D goes to zero. Similarly we argue that U goes to zero as D goes to zero. In light of the 
distortion constraint and that L is picked to be D'^'^, also the obvious inequality that ~2sL > — — for all 
s and L: 

n -[ 2 Q„2 1 2 fU Q„2 1 2 

2p Jl vStt Jl 4 VSvr Jdos 4 y/2TT 



hence: 



1 _^ . D 1 , D 



^e-'^ds<—+l '3D"-^^=e~'^ds < — + '3D"-^ 
/do3 4 2p 7£,o.3 ^/2^ 2p 

take limit on both side when D ^ 0, the right hand side is 0, the left hand side is zero if and only if L/ as 
D goes to zero. We just showed that if we pick L = D^-^ and D goes to zero, then both U and r goes to zero if 
the distortion constraint be satisfied. This means that the in this case: 

lim = lim (px Pr(|s| > U) + r)D{ ^p^.f'^.f! \\p) = pD{l\\p) = p\og2- 

D^o r,u^o p X Pr(|s| > (7) + r p 

□ 

A simple corollary of Corollary |4] is as follows. For small p, the sparse signal studied in the compressive sensing 
literature: 

H{p) =p\og2{-) + (1 -p)log2(--^) =plog2(-) +log2(e)p 
p 1 ^ p p 



So the gap between the improved lower bound in (|79] | and the upper bound in (|76] | is at most log2(e)p which is 
dominated by the improvement p logj ^ for small p. 
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B. Numerical Results 

We plot the bounds in ( f76] l- ( |79] ) for p = 0.1. As shown in Figure |4] the rate distortion function R{D,p) is 
bounded by the lower and upper bounds in ( |76] )- ( |79] l 
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Fig. 4. Lower and upper bounds on R{D,p) for p = 0.1 at high distortion levels, the distortion D runs from 0.005 to 0.1 



VIII. Conclusions and Future Work 

In this paper we study the rate distortion function for Bernoulli-Gaussian sequences. The main result is an 
improved lower bound on the rate distortion function. The improvement over the known best lower bound is 
p log2 i if is small. This is significant since the currently known gap between the lower bound and upper bound 
is H{p), hence the improved lower bound is almost tight for sparse signals where p <C 1. To derive this lower 
bound, we develop a new technique to lower bound part of the rate distortion function through a randomized lossy 
coding channel. This is, to our knowledge, the first work on this topic. This new lower bound and the obvious upper 
bounds do not match. The lower bounding technique we use in this paper can be improved if we can relax the 
near-zero error probability constraint on the randomized channel coding. A potentially useful direction is to replace 
the channel coding part with a lossy source coder. This is left for future work. There is another interesting result 
we developed on the way to prove the main result. We showed the equivalence of the rate distortion functions in 
strong sense and expected distortion sense for continuous random variables with finite variances. 
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Fig. 5. The improvement Ri{D,p) for p = 0.1 at low distortion levels. As proved in Corollary |4] Ri(D,p) — > plog2 ^ as D — > 



References 

[1] Mukul Agarwal, Anant Sahai, and Sanjoy Mitter. Coding into a source: a direct inverse rate-distortion theorem. AUerton Conference, 
pages 569-578, 2006. 

[2] Toby Berger. Rate Distortion Theory: A mathematical basis for data compression. Prentice-Hall, 1971. 

[3] Emmanuel Candes and Terence Tao. Near optimal signal recovery from random projections: Universal encoding strategies? IEEE 

Transactions on Information Theory, 52:5406 - 5425, 2006. 
[4] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. John Wiley and Sons Inc., New York, 1991. 
[5] Imre Csiszar and Janos Korner Information Theory. Akademiai Kiado, Budapest, 1986. 
[6] David Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52:1289 - 1306, 2006. 

[7] Alyson K. Fletcher, Sundeep Rangan, and Vivek K. Goyal. On the rate-distortion performance of compressed sensing. Proc. IEEE Int. 

Conf. Acoustics, Speech, and Signal Process., pages 885-888, 2007. 
[8] Alyson K. Fletcher, Sundeep Rangan, Vivek K. Goyal, and Kannan Ramchandran. Denoising by sparse approximation: EiTor bounds based 

on rate-distortion theory. EURASIP Journal on Applied Signal Processing, pages 1-19, 2006. 
[9] Vivek K. Goyal, Alyson K. Fletcher, and Sundeep Rangan. Compressive sampling and lossy compression. IEEE Signal Processing 

Magazine, 25:48 - 56, 2008. 
[10] Claude Shannon. A mathematical theory of communication. Bell System Technical Journal, 27, 1948. 

Appendix 

A. Rate distortion function in the strong sense for continuous random variables 

It is shown that the rate distortions function for both the average distortion and the strong distortion are the same 
for discrete random variables Chapter 13.6 [4]. However it is not obvious if it is also true for continuous random 
variables. In this section, we give a sketch on why it is also true for continuous(mixed) random variables. Since 
we have not seen similar results in the classic literature on rate distortion function [?] [2] and [?], we feel it is 
necessary to give a sketch of proof here. 

As shown in Figure |6] to make it more general, we let x be a mixture of a continuous probability function p{x) 
and finite many discrete values with positive probabilities (Pr(x = a^) = > shown as impulses in the figure). 
We need the mean and the variance of x to be finite: E{x) < oo and E{x'^) < oo. 
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Fig. 6. Probability density function p{x) of a continuous random variable x 

First, we argue that the rate distortion function in the expected distortion sense exists for the mixed random 
variables by approximating the impulses in the pdf by a sharp step functior0 so we have a continuous pdf and the 
rate distortion theorem can be applied. It remains to be shown that the continuous rate distortion function converges 
to the one for x as m ^ oo. This can be easily proved by noticing that the approximation error is at most ^ for 
this approximation, hence the rate distortion function of the continuous random variable converges to the mixed 
one. 

Now we show that the rate distortion function in the strong sense for continuous(mixed) random variable x, 
denoted by Rs{D,x) is equal to the rate distortion function in the expected sense, denoted by Re{D,x). 




u 



Fig. 7. Quantization of a probability density function p{x) of a mixed random variable x, 7 level quantization for the continuous part and 
exact representation of the discrete part. 

As shown in Figure |7] for the continuous part of the probability density function, we quantize the real line into 
{2K+1) quantization levels with the interval size d. The intervals are: [—Ku, —(K—l)u], [—u, 0], [0, u], [{K— 
l)u,Ku] and the "tail" interval {—oo, —Ku][J[Ku, oo). For each interval, the representation value is the middle 
point of the interval, specifically for the "tail" interval, the representation value is 0. We use the following function 
Qk,u to map a mixed random variable to a discrete random variable: 

X, px{x) > 0, 

qK,u{x) = {k + ^)u, px{x) = and X £ [ku,{k + l)u), k = —K,...,K—l 
0, Px{x) — and x G (— oo, —Ku] [J[Ku, oo) 

For a random variable x, the output of the map yx,« — qK.u{x) is a discrete random variable. Hence we know 
that the rate distortion functions in the strong sense, denoted by Rs{D, yK,u) and the expected distortion sense, 
denoted by Re{D, yK,u), are the same. 

Now we have four rate distortion functions, the rate distortion function for the mixed (continuous) random 
variable x, Rs{D,x) and Re{D,x), and the rate distortion functions for the quantized discrete random variables 

^For an impulse Pr(x = a^) = pi > 0, we add the continuous pdf p{x) by the following step function Pi{x): Pi{x) = m if x a 
~ "^'"-i + S^rl' P'(^) = ^ otherwise. 
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Rs{D, yK,u) and RE{D,yK.u)- The goal is to show that Rs{D,x) — Re{D,x). First, from the discussion in 
Appendix iBl we know that Rs{D,x) > Re{D,x). It remains to be shown that Rs{D,x) < Re{D,x). We will 
use the discrete random variable yk.u^ rate distortion functions as bridges to show that. We will show that when 
u and Ku oo: Rs{D,x) < Rs{D,yK.u) and Re{D, Yk.u) < Re{D,x). And knowing that for discrete 
random variables yK,u, ,yK,u) — Re{D, yK.u)- We will have: 

Rs{D,x) < Rs{D,yK.,u) = RE{D,yK.,u) < Re{D,x). 

This will conclude our proof that Rs{D,x) — Re{D,x). Now we only need to show that Rs{D,x) < 
Rs{D,yK.u) and RE{D,yK,u) < Re{D,x). 

1) Rs{D, x) < Rs{D, yK.u)-' We only need to show that if at a rate-distortion pair {R, D), there is a good lossy 
coder fm^ „ , gnu „ the strong sense for yK.u, then there is a good lossy coder /„, 5,1 in the strong sense for x. 

From the definition of the good lossy coder in the strong sense, we know that for any e > 0,: 

lim Pr (diy^.urgn^jLUylu))) > D + So) = 
Notice that yj^„ = Qk,u{x), so the above equation becomes: 



lim P 



(d(g/c„(x"),g„,- „(/„,, JgK,«(x")))) > D + So) = (81) 

where the quantizer qK.u{ ) is illustrated in Figure Q Now we show the following encoder decoder pair ^ , g^f^ „ 
is good in the strong sense for x when u goes to zero and Ku goes to infinity. Where 

fuK.A-) = JukAikA-))^ and g„^.,„(-) = .9„k,.(-)- 
Notice that the distortion d{-,-) is the mean square of the difference, so almost surely: 

< d{x\qKAx")) + d{qKA><n,9n,,,Afn,,JqKAx")))) 
1 " 

= -y,ix^-9KAx^))^+d{qKAxn,9nKMnKAqK.u{x''"m) (82) 

1=1 

We analyze the first term in ( |82] |. We decompose the sum square depending on how X; is quantized, remember for 
X > Ku, the quantization is and we assume that Ku is big enough that no discrete part of x is larger than Ku: 



1 " 1 1 

Tl Tl Tl 

i^l i:\xi\<Ku i:\xi\>Ku 



< u+- > xf 



i:\xi\>Ku 

Pick u < So and Ku big enough such that ^^^(Idxl > Ku)x^) < So — u, this is clearly doable because i?x(x^) < oo. 
Now we use the weak law of large numbers, 

Pr(- V(x,-gK,„(x,))' ><5o) < Pr(- V x^ > So - u) 
n ^-^ n ^-^ 

i=l i:\xi\>Ku 

X \ 

^ ¥v{-y^{l{\x\> Ku)x^ > So-u) 



n 
1=1 



as 71 — + cx) (83) 



Now we can bound the following probability: 
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Pr(d(^",5«^ J/nK Jx"))) > 2<5o) < Pr(-V(x,-gK,«(>^.))' + c'fe.n(x"),5n^ (gif,«(x")))) >?84) 

n. ^ — ^ 



n 
1=1 



< Pr(i V(xi - qK.u{xi)f > So) 



j=l 



+ Pr(d(q^f,„(x"),.9„^ J/„^_„(qK,„(x")))) > <5o) (85) 
as n ^ oo (86) 

where (HU follows ([821). (US) is true because Pr(x + y > 2eo) < Pr(x > eq or y > eg) < Prix > eq) + Pi'(y > ^o), 
while dMll follows (gB and ((831). 

2) Re{D, yK,u) < Re{D,x) : We only need to show that if at a rate-distortion pair {R,D), there is a good 
lossy coder /„, (7„ in the expected distortion sense for x, then there is a good lossy coder ^ , ^ in the strong 
sense for yK,u- 

From the definition of the good lossy coder in the expected distortion sense, we know that 

lim £;(d(x",5„(/„(x")))) <i? 

n — >oc 

Now we construct a good lossy coder in the expected distortion sense, we implement the following "inverse" 
map of qk.u, denoted by wk.u- Where wk,u is a random map, for any real sequence y" generated by the random 
variable yK,u, Hi can only take values on A ^ {ku : k — ~K, 0, ...,K and a E TZ where Px{a) > 0, the inverse 
map Wk.u : A TZ, such that: WK.u{yK,u) ^ x and for all y E A: WK,u{y) E {x eTZ : qK,u{x) = y}. Pictorically 
the inverse map maps the impulses in Figure Q back to the mixed random variable with probability density function 
in Figure |6] The good lossy coder in the expected distortion sense for yK,u is for all y" e A": 

Now we analyze the expected distortion of such coder 

i^(«,«,5n..„ (y^, J))) = E{d{yl^^,gMn{wKAyKj)))) 

< E {d{yi^, WKAylu))) + E {d{wKAylu),9n{.fn{wKAylump) 

The second term in ( [87] l converges to _D as n goes to infinity because wxAyK u)) ~ x" and /„, 5„ is good for 
x" in the expected distortion sense. As for the first term in dSTl i. we show it converges to zero for small u and 
big Ku as n goes to infinity. 

,1 " 



EyK.. id{yK,u^ ^KAyxA^) = ^>'^."(~ Y.^y^'^^^') " ^KAyKAi))?) 

i=l 

EyK_^{{yK,u - WKAyKA)'^) 



= EyK_^{^{wKAyKA < Ku){yK iyKA) ) 

+-E'yA..„(l(wK,u(yx,M) > Ku){yK,u - wxAyKA f) 

u^ 

< + EyK.,, (l(wi<-,„(yK,«) > Ku){yK [yKAf) (88) 

= — + Ex{l{x > Ku)x^) (89) 
as li ^ and Ku — > oo (90) 

is true because if \wKAyKA\ — '^hen the quantization error is no bigger than ^. ( [89t follows that 
WK,u{yKA ^ X- * l90l ) is true because the variance of x is finite. ( |90l l and dSTl i gives us the desired result that the 
expected distortion of ^ , converges to D if u goes to zero, Ku goes to infinity. 
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B. Constructing a good lossy source coding in the expected distortion sense from a good one in the strong sense 

The construction here is a general proof. It works for both continuous, discrete and mixed random variables. 
By constructing a good lossy source coder in the expected distortion sense from a good lossy coder in the strong 
sense at the same rate-distortion point (i?, D), we can easily see that the rate distortion function in the strong sense 
is not smaller than the rate distortion function in the expected distortion sense. This fact is used in the proof in 
Appendix lAl 

Assume both the first and second order moment of x are finite, i.e. E{x) = /i^ < oo and E{x'^) ^ < oo. If 
/„, g„ is good in the strong sense for R{D), then we denote by T„ C TZ", the subset the distortion constraint is not 

satisfied, i.e. T„ = {x" € 7^" : d(a;", 5„(/„(a;")) > D + 5}. Denote by e„ Pr(T„), then e„ 0. A good lossy 
coder might have 5n(/n(a;")) arbitrarily faraway from x" for x" e T„ as pointed out in Section \l-B\ and cause the 
expected distortion arbitrarily large. We build a new lossy coding system /„, 5„, such that 5n(/n(a;")) — 9n{fn{x")) 
for ^ T„ and gn{.fn{x")) = for .t" G T„. Obviously is good in the strong sense, we only need to show that 
fn,gn is also good in the expected distortion sense. The expected distortion of fn,gn is: 

i?(d(x",5n(/n(x")))) - Pr(x" e T^)i?(d(x«,5„(/„(x")))|x" e T^) +Pr(x" e T„)ii;(d(x", 3„(/„(x")))|x" e 
< (1 - e„)(i? + 5)+ Pr(x" e T„)i?(d(x",5„(/„(x")))|x" e T„) 

n 

= (1 - e„)iD + 5)+ Pr(x" G T„)i?(- V xf |x" € T„) 

n ^-^ 

1=1 

Now we upper bound the second term, first according to the weak law of large numbers and the variance and the 
mean of x are finite, we know that for any e > 0, there exists < oo, s.t for all n > n^: 



1 " 

(1-^x2 -a,|>e)<e. (92) 

71 ^ ^ 



Pr 

■ n 



1=1 



This implies that for any subset F e 7?." with measure Pr(r) > 1 — e, then there is a subset Fi C F, such that 
Pr(Fi) > 1 - 2e and for all a;" e Fi: 1^ Er=i ~ < 

From the definition of e„, we know that for large enough n, e„ — Pr(T„) < e or equivalently Pr(T„ ) > 1 — e. 

X 

From the above discussion, there exists subset Fi e T^, such that Pr(Fi) > 1 — 2e and for all x" e Fi: 
|i X]r=i " '^A — ^- '^he expectation of the mean variance of x" can be decomposed: 



1 " 



X?) 



n 

i=l 



= Pr(x" e T^)i?(- J^xi^lx" e T^) + Pr(x" e T„)£;(- ^xf|x" e T„) 

i=l i=l 
-. n 1 ^ 

> Pr(x" G ri)£;(- V xf |x" e Fi) + Pr(x" e T„)i?(- y xf |x" e T„) 

i=l i=l 

> (1 - e)K - 6) + Pr(x" e T„)i?(- Vxfix" € T„) 

77 / 



71 

i=l 



Hence: 



1 " 

Pr(x" e T„)£;(- V xfix" e T,^) < e(l + ax) 



z— 1 



(93) 



Substituting (|9TT l into (|93] l, we have: 

£;(d(x",g„(/„(x")))) < (l-e„)(i^ + (5)+e(l + (7x) <i^ + (5 + e(l + ax) 

Note that the above is true for all e and 5, so we can let both be arbitrarily small and the expected distortion of 
fn,gn is arbitrarily close to D. Hence we just constructed a good lossy coding system in the expected distortion 
sense from a good lossy coding system in the strong sense. □ 
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C. Proof of the simple bounds: proof of Propositions\I]\2l \3\and^ 
Proof of Proposition [TJ 

To show R{D, (7^)) > 1)), we only need to construct a sequence of good, in the strong sense of 

rate distortion in dU, encoder/decoder pairs (/^, g'^), n = 1,2, for 1) from that for S(p, cr^), {fn,gn), n — 
1, 2, .... Let /; and g'„ be as follows, for all x" e X" and 0"-^ e {0, 1}"-": 



rf /_n\ X / n\ I ( 7iR\ / nR\ 

Jn(x ) = Jn[<JX ), g^{a ) = -5„(a ) 



So forx~ S(p,0,l) 



Pr (^d(x",5;(/;(x"))) > ^) = Pr (^d{x'\ IgMniax^))) > ^ 

= Pr(d(ax",g„(/„(ax"))) >i? + ^) (94) 

where (|94] i is because the distortion measure y) — {x — y)^ in this paper. 

Obviously for x ^ 1), crx ^ S(p, ct^), and if /„ and (/„ are good in the strong sense, defined in (O, for 
cr^), then for all (5 > 0: 

lim Pr (d(ax", g„(/„(ax")) >D + 6) = Q. (95) 

n— !-cx2 

Combining < [94| ) and < [95] t, we have: 



lim Pr(d(x",g;(/;(x")))>^) ^0. 



Notice that 5 is an arbitrary positive number and a is constant, we just show that R{D, S(p, cr^)) > S(p, 1)). 

Similarly we can show that R{D,E{p,a'^)) < R{-^,E{p,l)). This complete the proof that R{D,E{p,<7^)) ~ 

R{§..^{pA)). " □ 

Proof of Proposition |2l for a Bernoulli-Gaussian random sequence x", by Definition [T] we know that Xi = biXSi, 
bi ^ Bernoulli — p and Si ~ N{0, 1) are i.i.d random variables. The encoder /„ works as follows. It is consisted 
of two parts. First the encoder encode fa" losslessly using a fixed length code-book. Then the encoder encode lossily 
the subsequence of s" where fai 7^ by applying standard Gaussian lossy source coding. 

We now describe the coding scheme fn,gn, in details. If 6" is ei-strong typical, and write 1(6") as the number 
of I's in sequence 5". i.e.: 

6" g B" ^ {6" e {0,1}" : -p| < ei}. 

n 

then /„ one-to-one maps 6" to a binary sequence of length n{H{p) + T{ei)) excluding the all zero signal, otherwise 
&" ^ B"^, fn sends the all zero signal, where r(ei) ^ if ei ^ 0, this is guaranteed by the standard lossless 
source coding theorem. Obviously for all ei > 0: 

lim Pr(i)" ^ B^" ) = (96) 

Now for each x" = &" x s", if 6" G _B" , we know that n{p — ei) < 1(&") < + ei). Denote by a new 
sequence si, ...sijbn) the non zero entries of a;". Then the encoder /„ passes s^^'' '> to a good Gaussian lossy 
encoder-decoder pair with rate R{^, N{0,1)) for a sequence of length 1(&"). If output of 

when 1(6") < n(p + ei), is shorter than n{p + ei)R{^, N{0, 1)), /„ just pad zeros at the end. The total block 
length for x" is 

niHip) + T(ei)) + nip + ei)R{-,N{0, 1)). (97) 

P 

If the output form the encoder is not a all zero sequence, the decoder g„ first looks at the first n{H{p) + T(ei)) 
bits and recover 6" exactly and hence 1(6"). Then gn discards the padded zeros at the end and pass the rest to the 
Gaussian lossy decoder 5i(b>.) with rate R{^, N{0,1)) for a sequence of length 1(6"). Then g„ put the outputs 
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of to the non-zero locations of 6" one by one. By using the coding system described above, we have for 

nd(a;",5„(/„(x"))) = l(6")d(Si(''"),5i(,„)(A(,„)(Si(^")))) (98) 

and because s" and 6" are independent and the coding system fi(b^),gi(b") is good, for all fixed b" € i?"^, for all 
<5o >0: 

lim Pr fd(si(''"),gi(,„)(/i(,„)(si(''")))) > ^ + <5o) = 0. (99) 
Now we evaluate the performance of fn-,9n^ for all 5i > 0: 
lim Pr(d(x",g„(/„(x"))) > i^ + ,5i) < lim {Pr (fa" ^ B^^) + Pr (d(x", g„(/„(x"))) > D + Si\b^ E B^JJlOQ) 

n — '■oo n— »oo 

= lim Pr(d(x",5„(/„(x")))>i? + <5i|fa"eBrJ (101) 
^ Jim Pr (d(5i(^"),5i(..)(A(..)(si(^")))) > G S:;) (102) 

^ lim Pr fd(si(^"),^,(,„)(A(,„)(si(^")))) > ^ + ^if^lfa" e 

= (104) 

(fTOOl i is because for events A and B, Pr(A) = Pr(A, B) + Pi{A, B") < Pr{B) + Pr(^|B=). ( fToTT i is true because 
of (|96ll. (HUl implies ( fT02b . ( fT03] ) is true because 1(6") < n(p + ei) if 6" G B^^. Finally, for any Si, by letting ei 
small enough, hence ^^fj^ ^^ > and by ( |99] l and the fact that s^^''") is induced by x", we have ( I104l l. 
( |97] i together with (1104b implies that: 

< NiO, 1)) + r(ei) + eiR{^,NiO, 1)). 

Notice that we can pick ei arbitrarily small and hence T(ei) arbitrarily small, we have R{D,p) < H(p) + 
pi?(f , iV(0, 1)) = Hip) + pR{D, N{0,p)). □ 

Proof of Proposition |3t this is a direct corollary of the upper bound in Corollary |2] Notice that the variance of 
a Bernoulli-Gaussian random variable 1) is p, so according to Corollary |2l if 1) is a continuous random 
variable, we would have: 

R{D, p) <max{^\og^,0} ^ R{D,N{0,p)) (105) 

The technicality here is that 1) is not a continuous random variable, but the fix is quite easy. Let random 
variable y„i be the p-mixture of a Gaussian N{0, 1) and a uniformly distributed random variable on [— i-e. 
with probability I — p, ym ^ N{0, 1) and with probability p, ^ J7[— — , — ]. The pdf of y„j is 



Obviously, y„j is a continuous random variable with variance p+ -g^. Now according to Corollary |2l we know 
that the rate distortion function for y,„, Ry^{D) is upper bounded by 

max{i log ^, 0} = R{D, N{0,p+ ^)). (107) 
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Now we upper bound R{D,p) by constructing a good randomized lossy coding system for x ^ E{p,l) in 
the average sense, f"^i9n^ from a good lossy coding system for y,„, f^,g^- Given x" ~ 1), applies the 
following operation on it, for i = 1, 2, n, let 

where u^'s are independent and Ui ~ C^[-^, ^] is a uniform distributed random variable. It is clear that Zj has the 
same distribution as y,„. Now /" passes z" to encoder The decoder 5" = g^. Now we analyze the performance 
of the coding system /^,.g". 

First, because is a good in the strong sense for y™, we have, for any > 0: 

lim Pr(d(z",ffj;(/^(z")) >D + 5,) = 0. 

n — >oo 

From the construction of fn,gn^ we know that (•?")) — 5n(/ri(^")) ^-S-, where z" is induced from x" and 

u", denote <(/^(x")), or equivalently g^(/^(z"))i by 1/1/,, so : 



lim Pt {d{z'\ g"Jf^{x'')) > D + Si)) ^ lim Pr (d(z", w") > D + 5i)) = 0. (109) 

n — >oo n — >oo 

Secondly, from the construction of z", we know that for all i, \xi — Zi\ < ^ a.s.. So we have a.s.: 

1 " 

1 " 

= - ^(Xi - Zi + Zj - Wi)^ 

2—1 
1 " 

< -E(k-Z,| + |z,-^U,|)2 

1=1 
1 " 1 
n ^ — ' m 

i=l 

11^. .0 2 " 



< - + -Y^iA - w,)^ + V|z, (110) 

m n ^ — ' nm ^ — ' 

i=l i=l 

By the Cauchy-Schwartz inequality, a.s.: 

(f^|z,-^,|)2<(f]|z,;-^,P)(f]l) 
%=\ i=l i—1 

hence a.s. 



n 



, \zi - Wi\ < 

■ 1 



\ 



1 " 

-(V|Z.- (Ill) 



Now for a realization of x" and u": and u", the induced realization of z" and are z" and 

respectively. If w") < + Ji, then combining dl lOt and (111 11 1, we have: 



TO 771 



< i. + ^, + i±^^^ 



This means that 



Pr (d(z", w") > 13 + Si) > Pr ( rf(x", w") > £> + 5i + ^ + ^^^-^ + j 12) 
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The above coding system is a randomized coding system where the performance is measured under the distribution 
of the "dithering" random variable u. Now if we take the above average "dithering", i.e.: for each a;" € R", 



if Pr ( >D + 5, + i±V£±ir\ < , 

\ m J 

there exists e ^] and of course Ui(a;") = if a;^ 7^ 0, such that the distortion between and the 

output of the the lossy coding system fn,9n with input a;" + u"(a;") is no bigger than I? + (5i + ^^'^^m^'^ ~'- 



dix-,gyifyix" + n"ix-m <D + S,+ ^ + ^^^ + ^1 

m 

otherwise, simply let u"(a;") = 0. 

Finally, let /„ and g„ be such that, for all a;", fn{x") = /^(^c" + u"(a;")) and g„ = g^. The construction of 
gn,fn impUes that 



Pr(d(x", 5„(/„(x")) > + 5i + l + V^ + '^'i ) < fd(x-, w-)>D + S^+ ^ + ^^^ + 6, ' 

m \ m 

Now combining ( 11091 ), ( 1112b and ( 1113b . we have: 



lim Pr(d(x",5„(/„(x")) > + <5i + i±^v£±ii) = q 

n^oo m 

Note that the rate of the coding system is Ry^ {D) which is upper bounded by R{D, N{0,p + ^)) in dMll. So 



R{D + 5^ + ^ l^p)<R^D,N{0,p+—)) (114) 

m dm^ 

while ( II 14b is true for all (5i > and m G Af. Note that the Gaussian rate distortion function R{D, N{Q, a^)) is 
continuous in cr^ and the Bernoulli-Gaussian rate distortion function R{D,p) is monotonically decreasing and 
bounded in D, hence continuous with measure 1. By letting m ^ 00 and Si 0, we have: R{D,p) < 
RiD,N{0,p)). □ 

Proof of Proposition |4l For a good lossy coding system fn,gn for Bernoulli-Gaussian sequence x" = fa" x s" ~ 
1) defined in Definition [T] and distortion constraint D, the rate is R{D,p), i.e. 

/„ : i?" ^ {0, g„ : {0, ^ 7^", Pr (d(x", g„(/„(x"))) > + <5i) = e„ 

and for all 5i > 0: lim e„ = 0. (115) 

n — *oo 

We use the same notations as those in the proof of Proposition |2l We construct a good length m„ e [n{p — 
ei),n{p + ei)] lossy source coding system fm„,gm„ for s™" ^ N{0,p) under the same distortion constraint D, 
where m„ will be determined later First we decompose e„, by ( |96] l, we know that there exists n^-^ < 00, such that 

X 

for all n > n^^, Pr(fa" G B^J > i, so for all n> n^^: 

e„ = Pr(d(x",g„(/„(x"))) >i? + (5i) 

> Pr (fa" e Bl , d(x",g„(/„(x"))) > + (116) 
= Pr(fa"-6" , d(x",5„(/„(x"))) >i^ + <5i) (117) 

= p\(fa"eB:\) ^ Zl(^l^^Pr(d(x",g„(/„(x")))>7^ + ,5i|fa" = 6") (118) 
Pr(fa" e B^J 

^ ^ E 0(&")Pr(d(x",g„(/„(x"))) >i? + ^i|fa" = 6") (119) 
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( fTTel l. ( [TTtI i and ( fTTSl l are obvious, in ( fTT9] l. we denote 0(6") by T''^^""^"-' ■ Notice that </)() is a probability 

Pr(lb"eS^"^) 

measure on _B"^. Hence there exists 6" G S", write 1(6") = m„ G [n{p — ei),n{p + ei)], such that: 

Pr(d(x",5„(/„(x")))>I? + 5i|jb" = 6") <2e„. (120) 

We bound the distortion of .t" as follows, let li < I2 < ... < ^m„, i = {^i, ...Z„i„}, be the positions of the non-zero 
elements of 6", 

nd{x'\gMn{x^))) > Y.^xu-9Mn{x'^))uf. (121) 

i=l 

Substituting (I121l i into (1120b . we have: 

- 9n{fn{x''))uf > n{D + 5i)|6" = 6"^ < 2e„. (122) 

Now we are ready to construct a good lossy source coding system /m„,3m„ for s™" ~ N{Q,p). The encoder fm„ 
works as follows, for any sequence s™" e 7?.™", fm„is^"''^) — fn{T{s"'^")), for a binary sequence a'^'^^'P^^'^ £ 
{0, i}R(D,p)n. ~^^(^^R{D,p)n^ ^ 1^^^ (^fl(D.p)n))^ y ^ one-to-one map from 7^"'" to 7^": 

T{s"^") — s", where sj^ = s^, i = 1, 2, m„ and = 0, i ^ L 
T-i(s") = S"", where Si = Si,, i=l,2,...,TO„ 

x" — 6" X s", so if 6" = 6" then = for all i ^ L, and by the memorylessness of x". We have: 

Pr (m„d(s™",g™„(/™„(s""))) > n(D + ^i)) = Pr |^f](s,; - .9™„ (/„.„ (s")),)' > n(i? + ^i)j 

< 2e„. (123) 

where the inequality is by (fT22l l. Notice that m„ = 1(6") E [n{p- ei),n{p + ei)], so ^ G I^q^' ^^l- ^° 
and dust tells us: 



lim Pr 

n — >oo 



(m„d(s™",g„,„(/,„„(s""))) > n{D + 5,)) 



n 



lim Pr d(s™",5„„(/„„(s""))) > {D + Si) 



> lim Pr fd(s™",5™„(/^„(s""))) > ^^(Z? + 5i)) . (124) 

The encoder decoder pair fm,^,gm,^ use nR{D,p) bits, so the rate of this coding system is "■^j^'P') < M^£i 
(I124l l is true for all Si and ei, by letting ei 0, we just construct a rate distortion coding system for 

i.i.d Gaussian random variables s"*" ~ ^^(0, 1). From Corollary [T] we know that ^'^^'P^ > _R(-^, iV(0, 1)), i.e. 

R{D,p) > pR{^, NiO, 1)) = pR{D, NiO,p)) 



□ 
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D. Strong Typical Gaussian Sequences 

In this appendix we define and investigate the properties of the so called strong typical Gaussian sequences. For 
a sequence s" e 7?.", for a real number T e 7?., the empirical Z-th moment of entries in s" within interval [T, oo] 
is denoted by 

Definition 4: e-typical Gaussian sequences: A sequence s" is said to be e typical for iV(0, 1), if the foUowings 
are true: for any real number T > — oo. 



max < sup 



nUT) 



1 



/2Tr 



-.e 2 ds 



< e 



(125) 



The e-typical set of N{0, 1) is denoted by Se{n), similar to the strong typical set for random sequences with finite 
alphabet, we have the following concentration lemma. Note that the convergence is uniform convergence, in the 
sense that we ask the sequence to be typical for all real numbers T simultaneously. 

An almost equivalent "double-sided" definition of e-typical Gaussian sequence is as follows. First, for any — oo < 
S <T < oo, we denote by 

n 

Similar to that in Definition |4] we define the typical set S*{n) as the set of all sequence s", s.t. 



max < sup 

'=0,1:2 s<T 



nUS,T)- 



2tt 



e 2 ds 



< e 



(126) 



We now illustrate the equivalence of the two typical sets Se{n) and S*{n). First, obviously S*{n) C S^{n). 
Secondly, 



sup 

S<T 



n%{S,T)- 



1 



2n 



2 di 



sup 

S<T 



< sup 

S<T 

< 2 sup 

T 



ni4S)-nUT) 
nUT) 



1 



/2n 



2 ds 



ze 2 ds 



2 ds 



1 



e ^ ds 

2 

e^~ds 



This means S^{n) C 82^(71), so the concentration of the "double-sided" and the "one-sided" typical sets are 
equivalent. We use the latter definition of e-typical set in the main body of the paper However, for the sake of 
simplicity of notations, we prove the concentration of the e-typical set of the "one-sided" definition. 



Lemma 6: Concentration of Gaussian sequences: for i.i.d A^(0, 1) random sequence s", for all e > 

lim Pr(s" e Se{n)) = 1 



(127) 



Proof: we give a sketch of the proof here. The idea is to first quantize the real line for the Gaussian A^(0, 1) 
random variable then apply the concentration result for i.i.d discrete finite random sequences. The quantization goes 
as follows, we study the following intervals: {(—00, ~Kui], [—Kui, —{K— l)u!], [{K ~ Kuj], [Kuj, 00]}, i.e. 
the end points of the intervals are defined as follows: for an integer j within range [—K — 1, K + 1] we denote 
uj{j) — juj if j = —K, K and u!{~K — 1) = —00 and u!{K + 1) — 00. We can obviously let uj be small enough 
and K be big enough such that the following two integrals are true for all j — ~K ~ 1, ...K 



^ ^^ds 



2tt 



< - for / = 0,1,2. 



(128) 
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We let S'f-^{n) be the set that the typicahty condition in ( |126t is true for T = uj{j) for all j e {-K-1, ...,K+1} 
simultaneously, i.e. 



S^^^(n) = {s" : max <^ sup 

'=0,1,2 \j^-K-l,...,K+l 



OO 2 „^2 

s'' —=e^~ ds 



2tt 



We show that 

lim Pr(s" G S^'^in)) = 1 

n — >oo 

This is true because from the weak law of large numbers we know that for / = 0, 1, 2: 



lim Pr 



T V ^7'' 



e 2 ds 



< e 



1 



for all T e 7?. 1J{— oo, oo}, in particular for all T = uj{j), j — —K — 1, if + 1. This is a finite set, so 



lim Pr(s" e S'^'^in)) 

n — >co 



= lim Pr( max < sup 

n^oo ;=0,1,2 yj^_K-l,...,K+l 
= 1 



/27r 



ze 2 ds 



( 1130b is proved. In particular: 



lim Pr(s" e S'^'^in)) = 1 



Now we are ready to use ( 1133b to prove the lemma. 

For any s" and a real number T E [oj{j), + 1)], j E {—K — 1, ...,K + 1}, then obviously 



.(T) E [ni4u;{j + 1)), (c.(j))], so for / = 0, 1,2 : 



1 



' e i ds < n[„{uj{j)) 



/2n 
1 



ze 2 ds 



ze^ds+ I s^—=e^ds 

(j) V27r Ju{]) \/27r 



1 



< 



1 



s" e 2 (is 

(i) v27r 



e 

2' 



where (1 134b follows ( |128b . similarly we have for / = 0, 1, 2 



ni„(r) 



oo 1 



It V 27r 
( fT34l l and (fT35] l tells us that for Z = 0, 1, 2 



ni„(c.(j + l)) 



(i+i) v27r 



=e 2 lis 



nU(T)-/ s'^< 



2 ds 



< sup 

j=-K-l,...,K+l 



1 -s2 

"Ul^^O'))- / s'-=e— ds 
0) v27r 



Notice the definitions of Se{n) and St'^ {n), (1136b implies that 5e(ri) 3 St'^ {n), hence: 

lim Pr(s" E S,{n)) > lim Pr(s" E Sf^ (n)) = 1 



(129) 



(130) 



(131) 



(132) 
(133) 



(134) 



(135) 



(136) 



The lemma is proved. 



□ 
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E. Properties of{u + v)D{:^ \\p) 

In this section we show some properties of (u + v)D{:^^\\p), summarized in the following lemma. 

Lemma 7: If u, w > 0, ^j^^ > p, then {u + v)D{-^^\(p) is monotonicaUy increasing with u and monotonicaUy 
decreasing with v. 

Proof: First, both and are positive and monotonicaUy increasing with u if ^^^^ > p. Hence 

[u + is monotonicaUy increasing with u. 

Secondly, using basic calculus, we have: 



d{u^v)D{^\\p) _ d (u log( + V log( )) 



dv dv 
u V 



+ 1 + log 



u + v u + v \{u + v){l—p) 

log 



1 - 

U-\-V 

< (137) 
The last inequality is true because > p hence 1 — < 1 — p. □ 
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